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ABSTRACT 

We consider multichannel sparse recovery problem where the 
objective is to find good recovery of jointly sparse unknown 
signal vectors from the given multiple measurement vectors 
which are different linear combinations of the same known el¬ 
ementary vectors. Many popular greedy or convex algorithms 
perform poorly under non-Gaussian heavy-tailed noise condi¬ 
tions or in the face of outliers. In this paper, we propose the 
usage of mixed norms on data fidelity (residual matrix) 
term and the conventional fg, 2 -norm constraint on the signal 
matrix to promote row-sparsity. We devise a greedy pursuit 
algorithm based on simultaneous normalized iterative hard 
thresholding (SNIHT) algorithm. Simulation studies high¬ 
light the effectiveness of the proposed approaches to cope 
with different noise environments (i.i.d., row i.i.d, etc) and 
outliers. Usefulness of the methods are illustrated in source 
localization application with sensor arrays. 

Index Terms — multichannel sparse recovery, com¬ 
pressed sensing, robustness, iterative hard thresholding 

1. INTRODUCTION 

In the multiple measurement vector (MMV) model, a single 
measurement matrix is utilized to obtain multiple measure¬ 
ment vectors, i.e., = 4>Xi + ei, i = 1,... ,Q where $ is 

the M X N known measurement matrix and are the (unob¬ 
served) random noise vectors. Typically there are more col¬ 
umn vectors cf>^ than row vectors i.e., M < N (under¬ 
determined linear model). It is still possible to recover the 
unknown signal vectors x^, f = 1,..., Q by assuming that 
signals are sparse, i.e., some of the elements are zero. In 
matrix form, the MMV model reads Y = $X + E, where 
Y = (yi • • ■ yg) e X = (xi • • ■ xq) e 

and E = (ei • • • eg) G tpi^xQ gojject the measurement, the 
signal and the error vectors, respectively. When Q = 1, the 
model reduces to standard compressed sensing (CS) model 
[1]. Then, rather than recovering the sparse/compressible tar¬ 
get signals x^ separately using standard CS reconstruction al¬ 
gorithms, one attempts to simultaneously (jointly) recover all 
signals. The key assumption is that locations of nonzero val¬ 
ues primarily coincide, i.e., signal matrix X is iT rowsparse. 
Joint estimation can lead both to computational advantages 


and increased reconstruction accuracy [1-6]. The objective of 
multichannel sparse recovery problem is finding a row sparse 
approximation of the signal matrix X based on knowledge 
of Y, the measurement matrix $ and the sparsity level K. 
Applications include EEG/MEG [1] and direction-of-arrival 
(DOA) estimation of sources in array processing [7]. 

Most greedy CS reconstruction algorithms have been 
extended for solving MMV problems. These methods, 
such as simultaneous normalized iterative hard threshold¬ 
ing (SNIHT) algorithm [6] are guaranteed to perform very 
well provided that suitable conditions (e.g., incoherence of 
$ and non impulsive noise conditions) are met. The derived 
(worst case) recovery bounds depend linearly on ||E|| 2 , so the 
methods are not guaranteed to provide accurate reconstruc¬ 
tion/approximation under heavy-tailed non-Gaussian noise. 
In this paper, we consider different £p^q mixed norms on data 
fidelity (residual matrix) and devise a greedy SNIHT algo¬ 
rithm for obtaining a sparse solution. We focus on mixed 
£\ norms as they can provide robust solutions. As will be 
shown in the sequel, these methods are then based on spatial 
signs [8] of the residuals and therefore are nonparametric in 
nature. For an alternative robust approach, see [9]. 

The paper is organized as follows. In Section 2 we for¬ 
mulate a mixed-norm constrained objective function for the 
MMV problem and motivate the usage of fi-norm or the 
mixed ^ 2 , 1 - and 2 -norms. In Section 3 we formulate the 
greedy SNIHT algorithm whereas Section 4 provides sim¬ 
ulation examples illustrating the improved accuracy of the 
proposed methods in various noise conditions and signal 
to noise ratio (SNR) settings. Finally, effectiveness of the 
methods are illustrated in source localization application with 
sensor arrays in Section 5. 

Notations. Let [n] denote the set {1,..., n} for n G N"*". 
For a matrix A G and an index set T of cardinal¬ 

ity |r| = K, we denote by Ar (resp. A(r)) the M x K 
(resp. K X N) matrix restricted to the columns (resp. rows) 
of A indexed by the set T. The hh column vector of A is 
denoted by and the hermitian transpose of the hh row vec¬ 
tor of A by a(i), A = (ai aAf) = (a(i) a(M))“. 

The row-support of X G is the index set of rows con¬ 

taining non-zero elements: rsupp(X) = {i G [iV] : Xij ^ 
0 for some j}. For p,q G [1, 00 ), the mixed ip^q norm [10] of 


X G £^xQ is defined as 


/ / \ 9 /p\i/<? / \ 1/9 

iixiij>.9= (E(ENi"j ) =(^EN*)iipj 

The mixed norms generalize the usual matrix p-norms: if p = 
q, then ||X||p,p = ||X||p. The £ 2 -norm || • II 2 is called the 
Frobenius norm and will be denoted shortly as || • ||. In the 
same spirit, the usual Euclidean norm on vectors is denoted 
shortly as || • ||. The row-£o quasi-norm of a signal matrix X 
is the number of nonzero rows, i.e., ||X||o = | rsupp(X)|. 
The matrix X is then said to be K-rowsparse if ||X||o < K. 
We use Hk{ ) to denote the hard thresholding operator, for 
a matrix X G HkO^) retains the elements of the K 

rows of X that possess largest £ 2 -norms and set elements of 
the other rows to zero. Notation X|r refers to a sparsified 
version of X such that the entries in the rows indexed by set 
r remain unchanged while all other rows have all entries set 
to 0. 


2. ROBUST MIXED NORM MINIMIZATION 

Our objective is to recover X-rowsparse X in the MMV 
model. For this purpose, we consider the following con¬ 
strained optimization problem: 


the complex case, this approach can be considered optimal 
when the error terms are i.i.d. with (circular) complex 
generalized Gaussian (GG) distribution [11, Example 4] with 
exponent s = 1/2. It is important to realize that minimization 
of fp-norms in {Pp,p) implicitly assumes i.i.d.’ness of the er¬ 
ror terms. Since the measurement matrix Y is in many appli¬ 
cations a space x time matrix as in medical imaging or sensor 
array applications, the i.i.d. assumption of the error terms in 
time/space is often not valid. The benefit of mixed fi-norms, 
such as £ 2,1 and £12 considered here is that they introduce 
couplings [ 10 ] between the coefficients and offer robustness 
in case of dependent heavy-tailed errors or outliers. When the 
errors terms have dependencies in time and/or space, then £ 2,1 
and f 1 2 minimization can offer advantages over f 1 or £2 norm 
approaches. As will be shown later, the usage of £i-norm or 
the mixed fi-norms lead to non-parametric approaches that 
are based on the concept of spatial sign function [ 8 ] which in 
the scalar case (x G C) is defined as 


sign(a;) 


x/\x\, for a; 7 ^ 0 
0 , for a; = 0 


( 1 ) 


In the vector case, sign(x) = ||x|| ^x, = 0 for x 7 ^ 0, = 0. 


3. MIXED NORM SNIHT ALGORITHM 


imnCp,,||Y - $X||^ g subject to ||X||o < AT, 

where Cp ^ is an irrelevant constant used for making nota¬ 
tions compact. For p = q, the problem reduces to conven¬ 
tional fp-norm minimization of the residual matrix R = Y — 
4>X G under rowsparsity constraint on X. The well- 

known problem with /' 2 -norm minimization is that it gives 
a very small weight on small residuals and a strong weight 
on large residuals, implying that even a single large outlier 
can have a large influence on the obtained solution. For ro¬ 
bustness, one should utilize £i in mixed norms since it gives 
larger weights on small residuals and less weight on large 
residuals. In this paper we consider (Pp^q) in the cases that 
p,q € {1, 2}. The problem (Pp,q) is combinatorial (NP-hard). 
Hence suboptimal reduced complexity reconstruction algo¬ 
rithms have been proposed. These can be roughly divided into 
two classes: convex-relaxation algorithms (e.g., [3,7,10]) and 
greedy pursuit (e.g., [2, 6 ]) algorithms. In this paper, we de¬ 
vise a greedy simultaneous NIHT (SNIHT) algorithm for the 
problems (Pi,i) and (^ 2 , 1 )- The case (Pi, 2 ) is excluded due 
to the lack of space, but our approach and discussion straight¬ 
forwardly extends for this mixed £i norm as well. 

In (Pi,i) problem, one aims to minimize || Y — ^'XHi = 
lUij ~ under sparsity constraint, so the solu¬ 

tion can be viewed as a sparse multivariate least absolute devi¬ 
ation (LAD) regression estimator. The LAD regression (in the 
real-valued overdetermined linear regression) is well-known 
to offer robust solution with bounded influence function. In 


Iterative hard thresholding is a projected gradient descent 
method that is known to offer efficient and scalable solution 
for AT-sparse approximation problem [12]. The normalized 
IHT (NIHT) method updates the estimate of X by taking 
steps towards the direction of the negative gradient followed 
by projection onto the constrained space. In our multichannel 
sparse recovery problem, at (n + l)th iteration the SNIHT 
update is 

X "+1 = Pk(X” -f - $X”)) 

where = Vr» ||R||^ g is the complex matrix deriva¬ 

tive [13] with respect to (w.r.t.) R*, /i"+^ > 0 is the stepsize 
for the current iteration and p,q G {1, 2}. For £ 2 - and £ 1 - 
norms the derivatives are easily shown to be 

V’ 2 , 2 (R) = R. and ^/)i,i(R) = sign(R) 

respectively, where notation sign(R) refers to element-wise 
application of the spatial sign function (1), i.e., [sign(R)]y = 
sign(rij). For (2,1) mixed norm, we obtain 

V’2,i(R) = (sign(r(i)) ••• sign(r(M)))“ , 

that is, the vector spatial sign function is applied row-wise 
to the residual matrix R = (r(i) • • • Table 1 pro¬ 

vides the pseudo-code of the greedy SNIHT algorithm for 
the problem (Pp^q), which we call SNIHT(p, q) algorithm for 
short. Note that SNIHT(2, 2) corresponds to the conventional 
SNIHT studied in [ 6 ] and in [12] for Q = I case. 



Algorithm 1: SNIHT(p, q) algorithm 

input ; Y, 4>, sparsity K, mixed norm indices (p, q) 
output ; (X"+^, r"+^) estimates of X and rsupp(X) 
initialize: X° = 0, /i° = 0, r° = 0, n = 0. 

1 rO = rsupp(i?K(^^V'p.,(Y))) 
while halting criterion false do 

2 r; = - $x") 

3 G" = 

4 = CompStepsize(<i>, G", F”, p",p, q) 

5 X"+i = Rif(X" + /i"+iG”) 

6 r"+i = rsupp(X"+i) 

7 n = n + 1 

end 


We now describe the CompStepsize function which 
computes the stepsize update in Step 4. Following the 
approach in [12], assuming that we have identified the correct 
support at nth iteration, then we may look for a stepsize up¬ 
date as the minimizer of || Y — 4*X||^ ^ for the gradient 
ascent direction X” -f /iG"|r". Thus we find /i > 0 as the 
minimizer of the convex function 

IIY - $(X" + pG"|r™) ||;_^ = ||R" - (2) 

where R" = Y - $X" and B” = $r"G(rn) When p = q 
this reduces to minimizing a simple linear regression estima¬ 
tion problem, min^ ||r — /ib||P, where the response is r = 
vec(R”) and the predictor is b = vec(B”). Thus when using 
p = g = 2 as in conventional SNIHT [6], the minimizer of 
(2) is easily found to be = ||G”p„j||^/||$r"G"p„^|p. 
However, for the robust estimators that we are interested in, 
i.e., when using (p, q) = (1,1) and (p, q) = (2,1), a mini¬ 
mizer of (2) can not be found in closed-form. In the (p, q) = 
(1,1) case, it is easy to show that the solution p verfies the 
following fixed point (FP) equation p = where 

Hx = (E )~' E 

i,j i,j 

and R = R" — pB" = (fy) depends also on the unknown 
p. Then, instead of choosing the next update p”+^ as the 
minimizer of (2) which could be found by running the FP it¬ 
erations Pi = 7T(pi_|_i) for i = 0, 1 ,... until convergence 
(with initial value po > 0), we use a 1-step FP iterate which 
corresponds to a single iteration with initial value of iteration 
given by the previous stepsize p". In other words, in Step 4, 
we set p"+^ = iF(p”). In our simulation studies we noticed 
that this 1-step FP iterate often gave a very good approxima¬ 
tion of the true solution (within 3 decimal accuracy). In case 
we use (p, q) = (2,1), it is easy to show that the solution p 
verihes the FP equation p = where 

R*(p) = (El|f(dr'l|b(dl')~'El|fwr'l^<bHr(,)) 




Fig. 1. Average MSE of SNIHT(p, q) methods as a function 
of SNR in (a) CA/"(0, noise and (b) 03(0, cr^) noise. 

and the same approach, i.e., p"+^ = iT*(p"), is used for 
computing the stepsize update. 

4. SIMULATION STUDIES 

Next we illustrate the usefulness of the methods in a vari¬ 
ety of noise environments and SNR levels. Also, the effect 
of number of measurement vectors Q on recovery probabil¬ 
ity will be illustrated. The elements of $ are drawn from 
CA/'(0,1) distribution and the columns are unit-norm normal¬ 
ized. The coefficients of K non-zero row vectors of X have 
equal amplitudes ax = \xij\ — 1 Vi € F, j = 1,... ,Q and 
uniform phases, i.e., Aig{xij) ~ (7ni/(0, 27r). The support 
F = supp(X) is randomly chosen from {1,..., N} without 
replacement for each trial. We define the (generalized) sig¬ 
nal to noise ratio (SNR) as SNR(cr) = 10 log^o)^^/^^) = 
—201og2o O’ which depends on the scale parameter a of the 
error distribution. For robustness purposes, we will study the 
performance in i.i.d. complex circular f-distributed noise with 
1 / degrees of freedom (d.o.f.), ^ CG(0, cr^), when v <b 

and the scale parameter is a^ = Medi?^ (je^ p). This is an ex¬ 
ample of a heavy-tailed distribution with v = 1 corresponding 
to Cauchy distribution. Also note that at the limit v ^ 00 one 
obtains the complex Gaussian distribution. 

As performance measures of sparse signal recovery, we 
use both the (observed) mean squared error MSE(X) = 
“ X^ll^ and the empirical probability of 
exact recovery, PER = p = FI^I), where I(-) 

denotes the indicator function, X^^l and fI^I = rsupp(XM) 
denote the estimate of the iT-sparse signal Xl^l and the signal 
support F[^1 for the fth Monte Carlo (MC) trial, respectively. 
In all simulation settings described below, all the reporeted 
hgures are averages over L = 2000 MC trials, the length 
of the signal is (V = 512, the number of measurements is 
M = 256, and the sparsity level is K = 8. The number of 
measurement vectors is Q = 16 unless otherwise stated. 

Figure 1(a) depicts the MSE as a function of SNR in 
i.i.d. circular Gaussian noise, Cij ^ CA/’(0, tr^), where 





















SNR (dB) 

2 

4 

6 

8 

10 

12 

14 

16 

SNIHT(2, 2) 

0 

0 

.6 

.61 

.94 

.99 

.99 

1.0 

SNIHT(1,1) 

0 

.25 

.91 

1.0 

1.0 

1.0 

1.0 

1.0 

SNIHT(2,1) 

0 

.02 

.38 

.96 

1.0 

1.0 

1.0 

1.0 


Table 1. PER rates in Ct 3 ( 0 ,cr) distributed noise as a func¬ 
tion of SNR (dB). System parameters were (M, K, Q) = 
(256,512,8,16). 


= E[|eijp]. As expected, the conventional SNIHT(2,2) 
has the best performance, but SNIHT(2,1) suffers a neg¬ 
ligible 0.07 dB loss, whereas SNIHT(1,1) attain 1.07 dB 
performance loss. Note that SNR = 6 dB is the cutline for 
which all methods had full PER rate (= 1). Erom 4 dB the 
PER rate declines and reaches 0 at SNR = 0 dB for all of the 
methods. 

Next we study the performance in f-distributed noise with 
v = 3 d.o.f. Note that 03 ( 0 , 17 ) distribution has a finite 
variance so we can expect that also SN1HT(2, 2) can still 
work reliably in this setting. Eigure 1(b) which depict the 
MSE vs SNR illustrates severe degradation in reconstruction 
performance for the SN1HT(2,2). This is further illustrated 
in Table 1 which provides the PER rates for the considered 
SNlHT(p, q) methods. Note that the decline of PER rate starts 
much earlier for the conventional SNIHT than for the robust 
methods. 

Eigure 2(a) depicts the MSE of the methods in t-distributed 
noise of SNR((t) = 10 dB and d.o.f. 1 / varying in 1 / S [1, 5]. 
We observe that SNIHT(1,1) has the best performance as it 
retains low MSE for all values of i/. This is in deep contrast 
to SNIHT(2, 2) which starts an exponential increase at 1 / < 3, 
reaching sky-high MSE levels in Cauchy noise (ly = 1). The 
PER rates in Table 2 further illustrates the remarkable per¬ 
formance of the robust methods. Note that SN1HT(1,1) is 
able to maintain full PER rates for all values of ly, whereas 
SN1HT(2, 2) fails completely for v <3. 

The usefulness of joint recovery becomes more pro¬ 
nounced at low SNR’s, where multiple measurements can 
dramatically improve on the recovery by exploiting the joint 
information. This is illustrated in our next simulation set up, 
where d.o.f. v of the f-distributed noise is fixed at 1 / = 3 
and the SNR is 10 dB. Eigure 2(b) depicts the PER rates for 
increasing number of measurement vectors Q. As can be 
seen, the PER rate increases as a function of Q from poor 
14% (when Q = 2) to near full 100% recovery (when (5 = 6 ) 
when using SN1HT(1,1) method. Again, SN1HT(2,1) is 
slightly behind in performance to SN1HT(1,1). Conven¬ 
tional SN1HT(2, 2) is drastically behind the robust methods, 
reaching highest 96.6% rate when Q = 18. This is again in 
deep contrast with near 100% PER obtained by SN1HT(1,1) 
method only with (5 = 6 samples. 




Fig. 2. (a) MSE of SNIHT(p, q) methods in Cfjy(0, cr^) noise 
as a function of v, (b) Empirical PER rates of SNIHT(p, q) 
methods as a function of Q in Cf 3 ( 0 , cr^) noise. In both set¬ 
ting, the SNR was SNR(cr) = 10 dB. 


SNIHT 

Degrees of freedom 1 / 


1 

1.25 

1.5 

1.75 

2 

3 

4 

5 

( 2 , 2 ) 

0 

0 

0 

0 

.04 

.94 

.99 

1.0 

( 2 , 1 ) 

0 

.07 

.55 

.90 

.98 

1.0 

1.0 

1.0 

( 1 , 1 ) 

1.0 

1.0 

1.0 

1.0 

1.0 

1.0 

1.0 

1.0 


Table 2. PER rates in i.i.d. Cfi/(0,tT^) noise for different 
d.o.f. V and SNRjcr) = 10 dB. System parameters were 
(M, N, K, Q) = (256,512,8,16). 


5. APPLICATIONS TO SOURCE LOCALIZATION 

Consider a sensor array consisting of M sensors that receives 
K narrowband incoherent farfield plane-wave sources from a 
point source (M > K). At discrete time t, the array output 
(snapshot) y{t) € C™ is a weighted linear combination of the 
signal waveforms x(f) = ..., XK{t))^ corrupted by 

additive noise e(f) G C^, y{t) = A(6)x(f) -|- e(f), where 
A = A{9) is the M x K steering matrix parametrized by 
the vector 6 = {9i,..., Ok)^ of (distinct) unknown DOA’s 
of the sources. Each column vector a.{0i), called the steering 
vector, represents a point in known array manifold a(0). We 
assume that the number of sources K is known. 

As in [7], we cast the source localization problem as a 
multichannel sparse recovery problem as follows. We con¬ 
struct an overcomplete M x N steering matrix A{0), where 
6 = ( 01 ,..., 9n)^ represents a sampling grid of all source 
locations of interest. Suppose that 9 contains the true DOA’s 
9i, i = 1,..., AT. In this case the measurement matrix Y = 
(y(^i) ■■■ y(^Q)) S consisting of snapshots at 

time instants ti,... ,tQ can be exactly modelled as MMV 
model in which the signal matrix X G is iT-rowsparse 

matrix, whose K non-zero row vectors correspond to source 
signal sequences. Thus finding the DOA’s of the sources is 
equivalent to identifying the support E = supp(X). Since 
the steering matrix A{9) is known, we can use SNIHT meth¬ 
ods to identify the support. 





































We assume that K — 2 independent (spatially and tem¬ 
porally) complex circular Gaussian source signals of equal 
power cr^ arrive on uniform linear array (ULA) of M = 20 
sensors with half a wavelength inter-element spacing from 
DOA’s 6 i = 0° and 62 = 8°. In this case, the array manifold 
isa(6») = The noise 

matrix E € has i.i.d. row vectors, each row vector e(i) 

having complex Q-variate inverse Gaussian compound Gaus¬ 
sian (IG-CG) distribution [14] with shape parameter A = 0.1 
and covariance matrix Cov(e(j)) = Iq. Note that the covari¬ 
ance of the snapshot is Cov(y(ti)) = a^A(0)A(0)^ + Im, 
so we may use the popular MUSIC method to localize the 
sources. In other words, we search for the K = 2 peaks of 
the MUSIC pseudospectrum in the grid. We use a uniform 
grid 9 on [—90, 90] with 2° degree spacing, thus containing 
the true DOA’s. In Step 1 of SNIHT(p, q) algorithm, we lo¬ 
cate the K largest peaks of rownorms of ^(Y) instead 

of taking r° as indices of K largest rownorms of ^(Y). 

We then identify the support (which gives the DOA esti¬ 
mates) for all the methods over 1000 MC trials and compute 
the PER rates and the relative frequency of DOA estimates 
in the grid. Full PER rate = 1 implies that the support E 
correctly identihed the true DOA’s in all MC trials. Such a 
case is shown in upper plot of Figure 3 for the SNIHT(1,1) 
and SNIHT(2,1) when the number of snapshots is Q = 50 
and the SNR is -10 dB. The PER rates of SNIHT(2, 2) and 
MUSIC were considerably lower, 0.81 and 0.73, respectively. 
Next we keep other parameters hxed, but decrease the SNR 
to -20 dB. In this case, the MUSIC method fails completely 
and provides nearly a uniform frequency on the grid. This is 
illustrated in lower plot of Figure 3. Note that the proposed 
robust methods, SNIHT(1,1) and SNIHT(2,1), provide high 
peaks on the correct DOA’s. The PER rates of SNIHT(2,1), 
SNIHT(1,1), SNIHT(2, 2) and MUSIC were 0.70, 0.64, 0.11 
and 0.05, respectively. Hence the mixed fi-norm method 
SNIHT(2,1) has the best recovery performance. In conclus- 
tion, robust sparse recovery methods can offer considerable 
improvements in performance when the measurement envi¬ 
ronment is challenging (low SNR, small Q) 
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